Frequent Itemset Discovery with SQL Using Universal Quantification

نویسنده

  • Ralf Rantzau
چکیده

Algorithms for finding frequent itemsets fall into two broad classes: (1) algorithms that are based on non-trivial SQL statements to query and update a database, and (2) algorithms that employ sophisticated in-memory data structures, where the data is stored into and retrieved from flat files. Most performance experiments have shown that SQL-based approaches are inferior to mainmemory algorithms. However, the current trend of database vendors to integrate analysis functionalities into their query execution and optimization components, i.e., “closer to the data,” suggests revisiting these results and searching for new, potentially better solutions. We investigate approaches based on SQL92 and present a new approach called Quiver that employs universal and existential quantifications. This approach uses a table layout for itemsets, where a group of multiple records represents a single itemset. Hence, such a vertical layout is similar to the popular layout used for the transaction table, which is the input of frequent itemset discovery. Our approach is particularly beneficial if the database system in use provides adequate strategies and techniques for processing universally quantified queries, unlike current commercial systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query processing concepts and techniques for set containment tests

Relational division is an operator of the relational algebra that realizes universal quantifications in queries against a relational database. Expressing a universal quantification problem in SQL is cumbersome. If the division operator would have a counterpart in a query language, a more intuitive formulation of universal quantification problems would be possible. Although division is a derived...

متن کامل

Computing Frequent Itemsets Inside Oracle 10G

Frequent itemset counting is the first step for most association rule algorithms and some classification algorithms. It is the process of counting the number of occurrences of a set of items that happen across many transactions. The goal is to find those items which occur together most often. Expressing this functionality in RDBMS engines is difficult for two reasons. First, it leads to extreme...

متن کامل

Accelerating Closed Frequent Itemset Mining by Elimination of Null Transactions

The mining of frequent itemsets is often challenged by the length of the patterns mined and also by the number of transactions considered for the mining process. Another acute challenge that concerns the performance of any association rule mining algorithm is the presence of „null‟ transactions. This work proposes a closed frequent itemset mining algorithm viz., Closed Frequent Itemset Mining a...

متن کامل

AMKIS: An Algorithm for Association Mining

Mining frequent items and itemsets is a daunting task in large databases and has attracted research attention in recent years. Generating specific itemset, K –itemset having K items, is an interesting research problem in data mining and knowledge discovery. In this paper, we propose an algorithm for finding K itemset frequent pattern generation in large databases which is named as AMKIS. AMKIS ...

متن کامل

Itemset Support Queries Using Frequent Itemsets and Their Condensed Representations

The purpose of this paper is two-fold: First, we give efficient algorithms for answering itemset support queries for collections of itemsets from various representations of the frequency information. As index structures we use itemset tries of transaction databases, frequent itemsets and their condensed representations. Second, we evaluate the usefulness of condensed representations of frequent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004